Designing and Programming Survey Forms

Data, Democracy & Development (DDD) summer field practice course

Ayush Patel

At Azim Premji University

02 Jun, 2025

Hello

I am Ayush.

I am a researcher working at the intersection of data, development and economics.

I am a RStudio (Posit) certified tidyverse Instructor.

I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.

Did you come prepared?

Do you have a SurveyCTO account? if not, register for a free account here

Please download the SurveyCTO Collect app on your phone

Learning Goals

Get a working understanding of how components of survey tech work.

Design and program a survey questionnaire to make it available for field use.

Extract the collected data for analyses.

Build an intuition on how to ask questions.

What we will not cover

Advanced methods of communicating with servers using API.

Data encryption for sensitive data.

Building data pipelines that extract, clean and present analyses.

Sampling theory, and analyses.

Components of survey tech

and how it works



But Why should I know this stupid tech stuff? 1

Reckless Learning-1 (20-30 Mins)

  • For a group of 5-7 members. (2 mins)
  • Navigate to your own SurveyCTO console. Explore the Design, Collect, Monitor, Export and Configure tabs. (3 mins)
  • Choose one person from your group. We will use their SurveyCTO server for this exercise. (1 min)
  • Add the remaining members as team to the chosen one’s server. (3 min)
  • All members to login to the chosen server workspace in the SurveyCTO Collect App. (2 mins)
  • Using the Design tab in server console, upload this form. Only to be carried out on the chosen one’s server. (3 mins)
  • Use the SurveyCTO Collect app to get the blank form. (2 mins)
  • Use the SurveyCTO Collect app to record data using the new for for as many people as you can. (5 mins)
  • Use the SurveyCTO Collect app to submit the collected forms. (2 mins)
  • Go to the console of the chosen one’s server. In the Export tab, find the submitted data. Get this data and take a look. (5 mins)

Great Job. We will now explore several available features for designing survey forms.

Form Definition.

The Bare Bones

Say, there is only one single question. Nothing else is of interest to you. Everyone and anyone can be asked this question.

What would the form definition then look like?

First (name,label), you need the exact phrasing of the question.

Second (type), what is the type of answer that you expect and accept? (a number, a date, an image?)

Third (constraint), any reasonable constraints on the answers? (an age range, a date range ?)

Reckless Learning-2 (5 mins)

Create a copy of the file from the previous example.

Rename the file.

Open the newly renamed file. Remove all rows except the title rows from the survey sheet.

As discussed in the previous slide, add a single question in the survey sheet. Populate all necessary columns.

Remove all rows except the title row from the choices sheet.

Add row in choices sheet if needed.

Change the form title, and form ID in the settings sheet.

Upload the form on your server and test.

Great Job. We will slowly add more complexity to our forms.

Before we move ahead..

  • I shall introduce the Documentation Home page for SurveyCTO. Home Docs
  • Some important webpages of the documentation to keep at hand while you create form definitions.Core, Additioinal, Advanced

Expressions

Why?

Lets us calculate on the fly, apply conditional flows to questions, and validate answers.

There are two important ideas from the syntax point of view

First, referring to the value of a specific field.

Second, referring to the values the current field.

We shall look at how these are used.

Expressions

In the trivial-biryani-survey I use ${enrolled} = 1 to ensure that questions are asked to respondents that report to be enrolled in college.

I also use .>16 and .<130 to ensure age values are validated? Do you think my range is a reasonable one?

Can you find other instances in the biryani survey where I use such expression?

How is the $ operator different from the . operator?

See this page to browse through applying other logical and mathematical operations.

Constraints

The need to restrict answers to a reasonable set of options or numbers.

How many people live in your house: “56”

The constraints are implemented in the constraint column of the form definition

The constraints can be accompanied with a message for the enumerator. This can be done using the constraint message column of the form definition

Can you find the constraint implemented in the trivial-biryani-survey for number of times a respondent consumes biryani in a week?

Relevance

It is usually of interest to see how certain people answer a question of our interest.

We can use expressions to change the flow of the survey questions so that we ask the right question to the right people.

expressions are provided in the relevance column for a question or group of questions.

Continue only on consent is the common, important, and easiest to implement relevance that is found in all decent surveys.

In the trivial-biryani-survey see the relevance column to see an example.

Reckless Learning-3 (25-30 mins)

Create a form definition for a door-to-door household interview.

You are broadly interested in the access to banking and spending autonomy of stay-at-home married women.

You want to capture details of education, age, marital status,age at marriage of the target group and their spouse.

You want information on holding a bank account, debit cards, credit cards, UPI for the target group and their spouse.

For the target group you want to record who opened their account, Did they open an (additional or new) account after marriage.

Does the target group have any sources of income/savings independent of their spouse.

How often does the target group spend on indulgence expenses (not for their kids education, not for HH expenses, etc) only for themselves. Ask for reasons if they dont.

Begin by making a flow chart of questions, phrase the questions, use constraints and relevance as required.

Upload the form on SurveyCTO and test it, does it work as expected?

Grouping

Organization

Easy Navigation

Less Duplication

Less copy paste errors

type name label relevance
begin group gname Group1 ${child}=1
..fields.. f1 what…
..fields.. f2 how…
end group

Repeating Questions

type name label relevance repeat_count
begin repeat gname Group1 ${child}=1 ${num_child}
..fields.. f1 what…
..fields.. f2 how…
end repeat

see docs

Calculate

type name calculation relevance
calculate num_rand once(random())
integer treat ${num_rand}<=0.5
integer control ${num_rand}>0.5

see docs

Pre-Loading data

Allows us to use data, collected in previous surveys or otherwise, in survey forms.

Attach one or multiple supporting data files as csv or as spreadsheets or server data.

The attached data must have the first row as unique column names for the data.

There must exist a column to uniquely identify each row.

Every column from the the data that is required in the survey form gets its own calculate field in the form definition.

Pre-Loaading data

ind_id score_bir
281 10
212 2
271 2
181 1
245 6

Basline.csv

Pre-Loading data

You are interested in figuring out the right condiment for the new biryani that has been created.

As baseline, you have found a group of individuals that have been given a score to the new biryani, consumed on its own.

It is known that people who score less than 6 never order the product.

So, in your revisit you want to meet people who have scored it less than 6 and offer one of the two condiments at random (salan or raita).

Then ask them to score again. Finally, see if a condiment is able to get the average scores above 6.

Pre-Loading data

pulldata() is the function that allows us to get additional data into the survey form.

pulldata(name of the file, column of interest, uid column in data,uid value to match)

pulldata(Baseline,score_bir,ind_id,${id}) is provided in calculation column of the form.

Reckless Learning-4

  • Design a baseline form, record biryani scores by individuals.
  • Each individual should have a uid, say mobile number.
  • Upload this form and survey peers.
  • Export data.
  • Design a follow up form, where:
  • You identify individuals by asking their uid.
  • State their scores from the previous tasting.
  • If the scores are below 6, randomly offer tasting again with salan or raita and get scores again.

Pre-loading data

It is unreasonable to think that the respondent or the enumerator will recall their uid all the time with accuracy.

We need ability to search through the additional data to identify the right respondents.

Pre-Loading data

The enumerator should have the ability to search through the possible respondents of a given area (district, village, ward, etc)

district_name district_id village village_id hhid
D1 1 v1 1 100
D1 1 v2 2 101
D1 1 v3 3 102
D1 1 v4 4 103
D1 1 v5 5 104
D2 2 v6 6 105
D2 2 v7 7 106
D2 2 v8 8 107
D2 2 v9 9 108
D2 2 v10 10 109

listing.csv

Pre-Loading data

type name label apperance
select_one district choose_dist enumerator choose dist search(“listing”)

survey

list_name value label
district district_id district_name

choices

Pre-Loading data

type name label apperance
select_one village choose_village enumerator choose village search(“listing”, “matches”, “district_id”, ${choose_dist})

survey

list_name value label
village vilage_id village

choices

Pre-Loading data

type name label apperance
select_one HH choose_HH enumerator choose HH search(“listing”, “matches”, “district_id”, ${choose_village})

survey

list_name value label
village hhid hhid

choices

Data Cleaning Tips

{janitor}

Use the janitor package:

  • Clean column names
  • Find Duplicates
  • Compare data frames for same columns before row binding

Data Checks

  • Make sure all your column types are appropriate
  • Check data summaries - range, max, min, unique values etc
  • Do you have the data collected on the right dates?
  • is data tidy?
  • Changing choice values to labels

Choice values to Choice labels

# A tibble: 4 × 2
  q1    q2   
  <chr> <chr>
1 1     1 4  
2 1     3    
3 2     2 4  
4 4     2 3 4
# A tibble: 8 × 3
  list_name value label 
  <chr>     <dbl> <chr> 
1 flavour       1 red   
2 flavour       2 yellow
3 flavour       3 green 
4 flavour       4 blue  
5 months        1 Jan   
6 months        2 Feb   
7 months        3 Mar   
8 months        4 Apr   
# A tibble: 2 × 2
  type                   name 
  <chr>                  <chr>
1 select_one flavour     q1   
2 select_multiple months q2   

Choice values to Choice Labels - get your ducks in a row

# join the survey and choice tables

left_join(choices,
          survey|>
            mutate(
              type = str_extract(type, "\\s.{1,}$")|>
                str_squish()
            ),
          join_by(list_name == type)) -> survey_choices

survey_choices
# A tibble: 8 × 4
  list_name value label  name 
  <chr>     <dbl> <chr>  <chr>
1 flavour       1 red    q1   
2 flavour       2 yellow q1   
3 flavour       3 green  q1   
4 flavour       4 blue   q1   
5 months        1 Jan    q2   
6 months        2 Feb    q2   
7 months        3 Mar    q2   
8 months        4 Apr    q2   

Choice values to Choice Labels - Function to replace single opt choices

replace_label_single <- function(nm){
  
  data|>
    pull(.data[[nm]])|>
    as.numeric()->a
  
  return(
      map_chr(
    a,
    ~survey_choices|>
  filter(name == nm)|>
  filter(value == .x)|>
  pull(label)
  )
  )

}

Choice values to Choice Labels - apply single opt replace

replace_label_single("q1")
[1] "red"    "red"    "yellow" "blue"  
data|>
  mutate(
    q1 = replace_label_single("q1")
  )
# A tibble: 4 × 2
  q1     q2   
  <chr>  <chr>
1 red    1 4  
2 red    3    
3 yellow 2 4  
4 blue   2 3 4

Choice values to Choice Labels - Function to replace single opt choices

[1] "Jan Apr"     "Mar"         "Feb Apr"     "Feb Mar Apr"

Choice values to Choice Labels - apply multiple opt replace

data|>
  mutate(
    q1 = replace_label_single("q1"),
    q2 = replace_label_multiple("q2")
  )
# A tibble: 4 × 2
  q1     q2         
  <chr>  <chr>      
1 red    Jan Apr    
2 red    Mar        
3 yellow Feb Apr    
4 blue   Feb Mar Apr

Data Viz tips

  • choosing the right geometry
    • always show number of observations in the chart
    • Chart titles, sub titles and captions are available for use
    • emphasis on the good, ugly and wrong chart type
    • choice of colours (contrast and colour blindness)
    • janitor, gt, reactable, kable for tables

Developing good taste for data viz

Wlak through

Fin.